Classification of Verb Particle Constructions with the Google Web1T Corpus
نویسندگان
چکیده
Manually maintaining comprehensive databases of multi-word expressions, for example Verb-Particle Constructions (VPCs), is infeasible. We describe a new type level classifier for potential VPCs, which uses information in the Google Web1T corpus to perform a simple linguistic constituency test. Specifically, we consider the fronting test, comparing the frequencies of the two possible orderings of the given verb and particle. Using only a small set of queries for each verb-particle pair, the system was able to achieve an F-score of 75.7% in our evaluation while processing thousands of queries a second.
منابع مشابه
Statistical Techniques for Automatically Inferring the Semantics of Verb-Particle Constructions
This paper describes an investigation of some potential features for a statistical approach to inferring the semantics of verb-particle constructions from corpus data. Verb-particles cause particular problems for the computational semantic analysis of language, because their meaning often cannot be derived through the usual compositional methods of analysis. Two novel techniques are presented w...
متن کاملAutomatic Identification Of English Verb Particle Constructions Using Linguistic Features
This paper presents a method for identifying token instances of verb particle constructions (VPCs) automatically, based on the output of the RASP parser. The proposed method pools together instances of VPCs and verb-PPs from the parser output and uses the sentential context of each such instance to differentiate VPCs from verb-PPs. We show our technique to perform at an F-score of 97.4% at iden...
متن کاملA Statistical Approach To The Semantics Of Verb-Particles
This paper describes a distributional approach to the semantics of verb-particle constructions (e.g. put up, make off ). We report first on a framework for implementing and evaluating such models. We then go on to report on the implementation of some techniques for using statistical models acquired from corpus data to infer the meaning of verb-particle constructions.
متن کاملVerb-Particle Constructions in the World Wide Web
In this paper we investigate the phenomenon of verb-particle constructions, discussing their characteristics and their availability for use with NLP systems. Combinations automatically extracted from corpora greatly improve the coverage of available resources. However, the data sparseness problem is particularly acute for these constructions and even using a corpus as large as the British Natio...
متن کاملUSYD: WSD and Lexical Substitution using the Web1T corpus
This paper describes the University of Sydney’s WSD and Lexical Substitution systems for SemEval-2007. These systems are principally based on evaluating the substitutability of potential synonyms in the context of the target word. Substitutability is measured using Pointwise Mutual Information as obtained from the Web1T corpus. The WSD systems are supervised, while the Lexical Substitution syst...
متن کامل